Machine Generation of Arabic

نویسندگان

  • DIACRITICAL MARKS
  • Moustafa Elshafei
  • Husni Al-Muhtaseb
  • Mansour Alghamdi
چکیده

The absence of the vowelization marks from the modern Arabic text represents a major obstacle in machine translation and other text understanding applications. In this paper we present a formulation of the problem of automatic generation of the Arabic diacritical marks from unvoweled text using a Hidden Markov Model (HMM) approach. The model considers the word sequence of unvoweled Arabic text as an observation sequence, and the possible diacritized expressions of the words as hidden states. The optimal sequence of diacritized words (or states) is then obtained efficiently using a dynamic programming algorithm. We present the basic algorithm and its evaluation, and discuss its limitations as well as various ramifications for improving its performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media

Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...

متن کامل

Developing a New System for Arabic Morphological Analysis and Generation

Arabic morphology poses special challenges to computational natural language processing systems. Its rich morphology and the highly complex word formation process of roots and patterns make computational approaches to Arabic very challenging. In this paper we present an approach for morphological analysis and generation of Modern Standard Arabic (MSA). Our approach is based on Arabic morphologi...

متن کامل

Morphological Analysis and Generation for Machine Translation from and to Arabic

In this paper, we present machine translation importance and the need of a linguistic treatment for the transfer based approach, then we present our method in analysis and generation based on linguistic features of Arabic word, dealing with scheme concept; to extract morphological information, these information is very useful in tree generation and structural transfer.

متن کامل

Syntactic Generation of Arabic in Interlingua-based Machine Translation Framework

Arabic is a highly inflectional language, with a rich morphology, relatively free word order, and two types of sentences: nominal and verbal. Arabic natural language processing in general is still underdeveloped and Arabic natural language generation (NLG) is even less developed. In particular, Arabic natural language generation from Interlingua was only investigated using template-based approa...

متن کامل

1 Machine Generation of Arabic Diacritical Marks

The absence of the vowelization marks from the modern Arabic text represents a major obstacle in machine translation and other text understanding applications. In this paper we present a formulation of the problem of automatic generation of the Arabic diacritic marks from unvoweled text using a Hidden Markov Model (HMM) approach. The model considers the word sequence of unvoweled Arabic text as...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006